68 research outputs found

    ENCODE whole-genome data in the UCSC Genome Browser

    Get PDF
    The Encyclopedia of DNA Elements (ENCODE) project is an international consortium of investigators funded to analyze the human genome with the goal of producing a comprehensive catalog of functional elements. The ENCODE Data Coordination Center at The University of California, Santa Cruz (UCSC) is the primary repository for experimental results generated by ENCODE investigators. These results are captured in the UCSC Genome Bioinformatics database and download server for visualization and data mining via the UCSC Genome Browser and companion tools (Rhead et al. The UCSC Genome Browser Database: update 2010, in this issue). The ENCODE web portal at UCSC (http://encodeproject.org or http://genome.ucsc.edu/ENCODE) provides information about the ENCODE data and convenient links for access

    The UCSC Genome Browser database: update 2010

    Get PDF
    The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

    The UCSC Genome Browser Database: update 2009

    Get PDF
    The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs

    The UCSC Genome Browser database: extensions and updates 2011

    Get PDF
    The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced ‘track data hubs’, which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image

    Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements

    Get PDF
    Background: Long non-coding RNAs (lncRNA) are a major class of non-coding RNAs. They are involved in diverse intra-cellular mechanisms like molecular scaffolding, splicing and DNA methylation. Through these mechanisms they are reported to play a role in cellular differentiation and development. They show an enriched expression in the brain where they are implicated in maintaining cellular identity, homeostasis, stress responses and plasticity. Low sequence conservation and lack of functional annotations make it difficult to identify homologs of mammalian lncRNAs in other vertebrates. A computational evaluation of the lncRNAs through systematic conservation analyses of both sequences as well as their genomic architecture is required.Results: Our results show that a subset of mouse candidate lncRNAs could be distinguished from random sequences based on their alignment with zebrafish phastCons elements. Using ROC analyses we were able to define a measure to select significantly conserved lncRNAs. Indeed, starting from ~2,800 mouse lncRNAs we could predict that between 4 and 11% present conserved sequence fragments in fish genomes. Gene ontology (GO) enrichment analyses of protein coding genes, proximal to the region of conservation, in both organisms highlighted similar GO classes like regulation of transcription and central nervous system development. The proximal coding genes in both the species show enrichment of their expression in brain. In summary, we show that interesting genomic regions in zebrafish could be marked based on their sequence homology to a mouse lncRNA, overlap with ESTs and proximity to genes involved in nervous system development.Conclusions: Conservation at the sequence level can identify a subset of putative lncRNA orthologs. The similar protein-coding neighborhood and transcriptional information about the conserved candidates provide support to the hypothesis that they share functional homology. The pipeline herein presented represents a proof of principle showing that a portion between 4 and 11% of lncRNAs retains region of conservation between mammals and fishes. We believe this study will result useful as a reference to analyze the conservation of lncRNAs in newly sequenced genomes and transcriptomes. \uc2\ua9 2013 Basu et al.; licensee BioMed Central Ltd

    The Tetraodon nigroviridis reference transcriptome: Developmental transition, length retention and microsynteny of long non-coding RNAs in a compact vertebrate genome

    Get PDF
    Pufferfish such as fugu and tetraodon carry the smallest genomes among all vertebrates and are ideal for studying genome evolution. However, comparative genomics using these species is hindered by the poor annotation of their genomes. We performed RNA sequencing during key stages of maternal to zygotic transition of Tetraodon nigroviridis and report its first developmental transcriptome. We assembled 61,033 transcripts (23,837 loci) representing 80% of the annotated gene models and 3816 novel coding transcripts from 2667 loci. We demonstrate the similarities of gene expression profiles between pufferfish and zebrafish during maternal to zygotic transition and annotated 1120 long non-coding RNAs (lncRNAs) many of which differentially expressed during development. The promoters for 60% of the assembled transcripts result validated by CAGE-seq. Despite the extreme compaction of the tetraodon genome and the dramatic loss of transposons, the length of lncRNA exons remain comparable to that of other vertebrates and a small set of lncRNAs appears enriched for transposable elements suggesting a selective pressure acting on lncRNAs length and composition. Finally, a set of lncRNAs are microsyntenic between teleost and vertebrates, which indicates potential regulatory interactions between lncRNAs and their flanking coding genes. Our work provides a fundamental molecular resource for vertebrate comparative genomics and embryogenesis studies

    Deep RNA Sequencing Reveals Novel Cardiac Transcriptomic Signatures for Physiological and Pathological Hypertrophy

    Get PDF
    Although both physiological hypertrophy (PHH) and pathological hypertrophy (PAH) of the heart have similar morphological appearances, only PAH leads to fatal heart failure. In the present study, we used RNA sequencing (RNA-Seq) to determine the transcriptomic signatures for both PHH and PAH. Approximately 13–20 million reads were obtained for both models, among which PAH showed more differentially expressed genes (DEGs) (2,041) than PHH (245). The expression of 417 genes was barely detectable in the normal heart but was suddenly activated in PAH. Among them, Foxm1 and Plk1 are of particular interest, since Ingenuity Pathway Analysis (IPA) using DEGs and upstream motif analysis showed that they are essential hub proteins that regulate the expression of downstream proteins associated with PAH. Meanwhile, 52 genes related to collagen, chemokines, and actin showed opposite expression patterns between PHH and PAH. MAZ-binding motifs were enriched in the upstream region of the participating genes. Alternative splicing (AS) of exon variants was also examined using RNA-Seq data for PAH and PHH. We found 317 and 196 exon inclusions and exon exclusions, respectively, for PAH, and 242 and 172 exon inclusions and exclusions, respectively for PHH. The AS pattern was mostly related to gains or losses of domains, changes in activity, and localization of the encoded proteins. The splicing variants of 8 genes (i.e., Fhl1, Rcan1, Ndrg2, Synpo, Ttll1, Cxxc5, Egfl7, and Tmpo) were experimentally confirmed. Multilateral pathway analysis showed that the patterns of quantitative (DEG) and qualitative (AS) changes differ depending on the type of pathway in PAH and PHH. One of the most significant changes in PHH is the severe downregulation of autoimmune pathways accompanied by significant AS. These findings revealed the unique transcriptomic signatures of PAH and PHH and also provided a more comprehensive understanding at both the quantitative and qualitative levels

    The Impact of Recombination on Nucleotide Substitutions in the Human Genome

    Get PDF
    Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp
    corecore